Database Service
This document provides comprehensive documentation for the DatabaseService component, which serves as the central MongoDB abstraction layer in the SuperSet Telegram Notification Bot. The service implements a protocol-like interface for MongoDB operations and encapsulates all data persistence concerns for notices, jobs, placement offers, users, and policies. It provides a clean separation between data access logic and business services, enabling dependency injection, testability, and modular operation across the system.
Key responsibilities include:
Centralized MongoDB access via DBClient
Notice lifecycle management (existence checks, insertion, retrieval, marking as sent)
Structured job upsert operations
Complex merge logic for placement offers with role merging and package comparisons
User management (add/reactivate/deactivate)
Policy operations (upsert by year)
Serialization of ObjectId fields for JSON transport
Error handling and logging
The DatabaseService resides in the services layer and collaborates with the DBClient for raw MongoDB connectivity. It integrates with other system components through dependency injection, enabling flexible orchestration in both CLI commands and runtime services.
app/services/database_service.py"] NS["NotificationService
app/services/notification_service.py"] NRun["NotificationRunner
app/runners/notification_runner.py"] OPS["OfficialPlacementService
app/services/official_placement_service.py"] end subgraph "Clients Layer" DBC["DBClient
app/clients/db_client.py"] end subgraph "Core" CFG["Config
app/core/config.py"] end subgraph "CLI" MAIN["main.py
app/main.py"] end subgraph "MongoDB Collections" Notices["Notices"] Jobs["Jobs"] Offers["PlacementOffers"] Users["Users"] Policies["Policies"] Official["OfficialPlacementData"] end MAIN --> DS NRun --> DS NS --> DS OPS --> DS DS --> DBC DBC --> Notices DBC --> Jobs DBC --> Offers DBC --> Users DBC --> Policies DBC --> Official CFG --> DBC
Diagram sources
Section sources
DatabaseService: Implements MongoDB operations for notices, jobs, placement offers, users, and policies. Provides CRUD methods, merge logic for placement offers, and serialization helpers.
DBClient: Handles MongoDB connection establishment, collection initialization, and connection lifecycle.
Configuration: Centralized settings management including MongoDB connection string and logging configuration.
Key capabilities:
Notice management: existence checks, insertions, retrieval, unsent notice enumeration, and marking as sent.
Structured job upsert: merges incoming job data with existing records.
Placement offers: bulk save with sophisticated merge logic for roles and student packages, emitting events for downstream processing.
User management: add/reactivate/deactivate users with soft delete semantics.
Policy operations: upsert by year with change detection.
Serialization: converts ObjectId fields to strings for JSON transport.
Section sources
The DatabaseService follows a layered architecture:
Clients layer: DBClient manages raw MongoDB connectivity and exposes typed collection properties.
Services layer: DatabaseService wraps DBClient and adds business logic for CRUD operations, merge algorithms, and serialization.
Orchestration layer: CLI commands and runners instantiate DBClient and DatabaseService, passing them to dependent services.
main.py" participant Runner as "NotificationRunner
notification_runner.py" participant DB as "DatabaseService" participant Client as "DBClient" participant Mongo as "MongoDB Collections" CLI->>Runner : create NotificationRunner() Runner->>Client : initialize DBClient() Runner->>Client : connect() Runner->>DB : initialize DatabaseService(DBClient) Runner->>DB : get_unsent_notices() DB->>Mongo : find unsent notices Mongo-->>DB : cursor DB-->>Runner : list of notices Runner->>DB : mark_as_sent(_id) DB->>Mongo : update_one(mark sent) Mongo-->>DB : acknowledged write DB-->>Runner : success Runner-->>CLI : send results
Diagram sources
DatabaseService Class#
The DatabaseService class encapsulates all MongoDB operations and acts as the primary data access object for notices, jobs, placement offers, users, and policies.
Diagram sources
Section sources
Notice Operations#
The notice subsystem provides lifecycle management for notifications:
Existence checks by unique notice id
Bulk retrieval of notice ids for efficient lookups
Insertion with automatic timestamps and sent flags
Retrieval by id and chronological enumeration of unsent notices
Marking as sent with timestamps
Diagram sources
Section sources
Structured Job Operations#
Structured job upsert maintains normalized job listings:
Existence checks by structured id
Upsert logic that merges incoming data with existing records
Timestamp management for created/updated records
Diagram sources
Section sources
Placement Offer Processing with Merge Logic#
The placement offers subsystem implements a complex merge algorithm designed to:
Group offers by company name
Merge roles with package comparison (higher package wins)
Merge students with package comparison and role updates
Track newly added students for event emission
Emit events for new offers and updated offers
Higher package wins"] MergeRoles --> MergeStudents["Merge students by enrollment/name
Higher package wins,
update role if provided"] MergeStudents --> ComputeTotals["Compute total students"] ComputeTotals --> UpdateDoc["$set: roles, students, totals, updated_at"] UpdateDoc --> Update["update_one by _id"] Update --> EmitUpdate["Emit update event if new students"] ExistingFound --> |No| InsertNew["insert_one with saved_at"] InsertNew --> EmitNew["Emit new offer event"] EmitUpdate --> Next EmitNew --> Next Next --> Iterate Iterate --> Done([Return counts and events])
Diagram sources
Section sources
User Management#
User management provides soft-deletion semantics:
Add or reactivate users with activation flag
Deactivate users (soft delete)
Retrieve active users and user statistics
Diagram sources
Section sources
Policy Operations#
Policy operations manage official policy documents by year:
Upsert by year with upsert semantics
Change detection and logging
Retrieval of published policies
Diagram sources
Section sources
Serialization Mechanisms for MongoDB ObjectId#
The DatabaseService provides a serialization helper to convert ObjectId fields to strings for JSON transport:
Converts top-level _id field to string
Can be extended for nested ObjectId fields if needed
Diagram sources
Section sources
Integration Patterns with Other System Components#
DatabaseService integrates with other components through dependency injection:
CLI commands: main.py orchestrates DBClient and DatabaseService creation for email processing and official data updates
NotificationRunner: creates its own DBClient/DatabaseService instance for sending unsent notices
NotificationService: relies on DatabaseService for fetching unsent notices and marking them as sent
OfficialPlacementService: uses DatabaseService to persist scraped official placement data
Diagram sources
Section sources
The DatabaseService exhibits low coupling and high cohesion:
Coupling: Minimal direct coupling to DBClient; all MongoDB operations delegated to DBClient collections
Cohesion: High cohesion around data access patterns and business logic for each entity type
External dependencies: PyMongo for MongoDB connectivity, datetime for timestamps, typing for type hints
Diagram sources
Section sources
Connection pooling: DBClient uses PyMongo’s built-in connection pooling; ensure appropriate pool sizing for production workloads
Index usage: The database schema defines indexes for frequent query patterns (e.g., notices by sent flags, jobs by deadlines)
Query optimization: Prefer targeted queries with projections and limits; avoid loading entire collections
Batch operations: Use bulk operations for large-scale updates (e.g., marking notices as sent)
Serialization overhead: ObjectId conversion is lightweight; avoid unnecessary conversions in hot paths
Logging impact: safe_print is used for operational logging; consider structured logging for production monitoring
[No sources needed since this section provides general guidance]
Common issues and resolutions:
Connection failures: Verify MONGO_CONNECTION_STR environment variable and network connectivity
Missing collections: Ensure database initialization occurs before accessing collections
Duplicate key errors: Use upsert operations for notices/jobs/policies; placement offers merge prevents duplicates
Serialization errors: Use _serialize_doc for ObjectId fields when returning JSON responses
Performance bottlenecks: Review query plans and add missing indexes as per DATABASE.md
Section sources
DatabaseService provides a robust, dependency-injected abstraction over MongoDB operations, enabling clean separation of concerns and facilitating integration across the system. Its comprehensive coverage of CRUD operations, sophisticated merge logic for placement offers, and serialization mechanisms make it a cornerstone of the data layer. Proper configuration, indexing, and logging practices ensure reliable operation in production environments.
[No sources needed since this section summarizes without analyzing specific files]
Practical Examples#
Service Initialization#
CLI-driven initialization: main.py creates DBClient, connects, and passes it to DatabaseService for email processing and official data updates
Runner-driven initialization: NotificationRunner creates its own DBClient/DatabaseService instance for sending unsent notices
Section sources
Common Operations#
Notice management: existence checks, insertion, retrieval, unsent enumeration, and marking as sent
Structured job upsert: merge incoming job data with existing records
Placement offers: bulk save with merge logic and event emission
User management: add/reactivate/deactivate users
Policy operations: upsert by year with change detection
Section sources
Database Schema Reference#
The database schema defines five main collections with specific indexes and data models. Notices, Jobs, PlacementOffers, Users, and OfficialPlacementData each have tailored schemas optimized for their respective use cases.
Section sources